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Estimating the coefficients of a noisy polynomial phase signal 
is important in fields including radar, biology and radio communi- 
cations. One approach attempts to perform polynomial regression 
on the phase of the signal. This is complicated by the fact that the 
phase is wrapped modulo 2n and must be unwrapped before regression 
can be performed. In this paper we consider an estimator that per- 
forms phase unwrapping in a least squares manner. We describe the 
asymptotic properties of this estimator, showing that it is strongly 
consistent and asymptotically normally distributed. 



1. Introduction. Polynomial phase signals arise in fields including radar, 
sonar, geophysics, speech analysis, biology, and radio communication [1—4]. 
In radar and sonar applications polynomial phase signals arise when acquir- 
ing radial velocity and acceleration (and higher order motion descriptors) of 
a target from a reflected signal, and also in continuous wave radar and low 
probability of intercept radar. In biology, polynomial phase signals are used 
to describe the sounds emitted by bats and dolphins for echo location. 

A polynomial phase signal of order m is a function of the form 

S (i) = e 2^W(*), 

where j = and t is a real number, often representing time, and 

y(t) = /2 + Jlit + fat 2 + ... fi m t m 

is a polynomial of order m. In practice the signal is typically sampled at 
discrete points in 'time', t. In this paper we only consider uniform sampling, 
where the gap between consecutive samples is constant. In this case we can 
always consider the samples to be taken at some set of consecutive integers 
and our sampled polynomial phase signal is s n = s{n) = e 2 " K ^ n \ where n 
is an integer. Of practical importance is the estimation of the coefficients 
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fio,... ,fl m from a number, say N, of consecutive observations of the noisy 
sampled signal 



where p is a real number greater than zero representing the (usually un- 
known) signal amplitude and {X n ,n £ Z} is a sequence of complex noise 
variables. In order to ensure identifiability it is necessary to restrict the m+1 
coefficients to a region of m + 1 dimensional Euclidean space M m+1 called 
an identifiable region. It was shown by some of the authors [5] that an iden- 
tifiable region tessellates a particular m + 1 dimensional lattice. We discuss 
this in Section 3. 

An obvious estimator of the unknown coefficients is the least squares esti- 
mator. When m = (phase estimation) or m = 1 (frequency estimation) the 
least squares estimator is an effective approach, being both computationally 
efficient and statistically accurate [6-8]. When m > 2 the computational 
complexity of the least squares estimator is large, and alternative estima- 
tors have been considered for this reason. These can loosely be grouped 
into two classes; estimators based on polynomial phase transforms, such as 
the discrete polynomial phase transform [9] and the high order phase func- 
tion [10, 11]; and estimators based on phase unwrapping, such as Kitchen's 
unwrapping estimator [12], and Morelande's Bayesian unwrapping estima- 



In this paper we consider the estimator that results from unwrapping the 
phase in a least squares manner. We call this the least squares unwrapping 
(LSU) estimator. It was shown by some of the authors [14, Sec. 8.1] [15] that 
the LSU estimator can be computed by finding a nearest point in a lat- 
tice [16], and Monte-Carlo simulations were used to show the LSU estima- 
tor's favourable statistical performance. In this paper we derive the asymp- 
totic properties of the LSU estimator. Under some assumptions about the 
distribution of the noise X\, . . . ,Xn, we show the estimator to be strongly 
consistent and asymptotically normally distributed. Similar results were 
stated without proof in [17]. Here, we give a proof. The results here are 
also more general than in [17], allowing for a wider class of noise distribu- 
tions. 

An interesting property is that the estimator of the kth polynomial phase 
coefficient converges almost surely to at rate o(N~ k ). This is perhaps 
not surprising, since it is the same rate observed in polynomial regression. 
However, asserting that convergence at this rate occurs in the polynomial 
phase setting is not trivial. For this purpose we make use of an elementary 
result about the number of arithmetic progressions contained inside subsets 




Y n = ps n + X, 



tor [13]. 
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of {1, 2, ... , N} [18-20]. The proof of asymptotic normality is complicated by 
the fact that the objective function corresponding with the LSU estimator is 
not differentiable everywhere. Empirical process techniques [21-24] and re- 
sults from the literature on hyperplane arrangements [25, 26] become useful 
here. We are hopeful that the proof techniques developed here will be useful 
for purposes other than polynomial phase estimation, and in particular other 
applications involving data that is 'wrapped' in some sense. Potential candi- 
dates are the phase wrapped images observed in modern radar and medical 
imaging devices such as synthetic aperture radar and magnetic resonance 
imaging [27, 28]. 

The paper is organised in the following way. Section 2 describes some pre- 
liminary concepts from lattice theory, and in Section 3 we use these results to 
describe an identifiable region for the set of polynomial phase coefficients. 
These identifiability results are required in order to properly understand 
the statistical properties of polynomial phase estimators. In Section 4 we 
describe the LSU estimator and state its asymptotic statistical properties. 
Section 5 gives the proof of strong consistency and Section 6 gives the proof 
of asymptotic normality. Section 7 describes the results of Monte Carlo sim- 
ulations with the LSU estimator. These simulations agree with the derived 
asymptotic properties. 

2. Lattices. A lattice, A, is a discrete subset of points in M n such that 

A = {x = Bu; u G Z d } 

where B S W nxd is an n X d matrix of rank d, called the generator matrix. 
If n = d the lattice is said to be full rank. Lattices are discrete Abelian 
groups under vector addition. They are subgroups of the Euclidean group 
R n . Lattices naturally give rise to tessellations of W 1 by the specification of 
a set of coset representatives for the quotient R n /A. One choice for a set 
of coset representatives is a fundamental parallelepiped; the parallelepiped 
generated by the columns of a generator matrix. Another choice is based 
on the Voronoi cell those points from M n nearest (with respect to the Eu- 
clidean norm here) to the lattice point at the origin. It is always possible to 
construct a rectangular set of representatives, as the next proposition will 
show. We will use these rectangular regions for describing the aliasing prop- 
erties of polynomial phase signals in Section 3. These rectangular regions 
will be important for the derivation of the asymptotic properties of the LSU 
estimator in Section 4. 

Proposition 1. Let A be an n dimensional lattice and B G R nxn be 
a generator matrix for A. Let B = QR where Q is orthonormal and R is 
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Fig 1. Rectangular tessellation constructed according to Proposition 1 where A is a 2 
dimensional lattice with generator matrix having columns [1,0.2]' and [0.2, 1]'. Any one of 
the boxes is a rectangular set of coset representatives for R 2 /A. The shaded box centered 
at the origin is the one given by Proposition 1. 

upper triangular with elements rij. Then the rectangular prism QP where 
P = Ilfc=i [ — ^fO ^ s a se t °f coset representatives for M. n /A. 

Proof. This result is well known [29, Chapter IX, Theorem IV] [14, 
Proposition 2.1]. This result is for lattices with full rank. A result in the 
general case can be obtained similarly, but is not required here. □ 

3. Identifiability and aliasing. As discussed in the introduction, a 
polynomial phase signal of order m is a complex valued function of the 
form s{t) = e 2njy ^ where t is a real number and y(t) is a polynomial of 
order m. We will often drop the (t) and just write the polynomial as y and 
the polynomial phase signal as s whenever there is no chance of ambiguity. 
Aliasing can occur when polynomial-phase signals are sampled. That is, 
two or more distinct polynomial-phase signals can take exactly the same 
values at the sample points. Understanding how aliasing occurs is crucial to 
understanding the behaviour of polynomial phase estimators. The aliasing 
properties are described in [5], but, here we present the properties in a way 
that is better suited to studying the LSU estimator. 



POLYNOMIAL PHASE ESTIMATION BY PHASE UNWRAPPING 



5 



Let Z be the set of polynomials of order at most m that take integer 
values when evaluated at integers. That is, Z contains all polynomials p 
such that p{n) is an integer whenever n is an integer. Let y and z be two 
distinct polynomials such that z = y + p for some polynomial p in Z. The 
two polynomial phase signals s(t) = e 2 ^^' and r(t) = e 27rj '^*' are distinct 
because y and z are distinct, but if we sample s and r at the integers 

s (n) = e 27r ^( n ) = e 27r J^( n ) e 27r JP( n ) = e 2n i(y( n )+p( n )) — e ^j z ( n ) = r (jA 

because p(n) is always an integer and therefore e 27r ^ p ^ = 1 for all n £ Z. 
The polynomial phase signals s and r are equal at the integers, and although 
they are distinct, they are indistinguishable from their samples. We call 
such polynomial phase signals aliases and immediately obtain the following 
theorem. 

Theorem 1. Two polynomial phase signals s(t) = e 2n ^ y ^ and r(t) = 
e z-Kjz(t) are a n ases if an d on iy jf fji e polynomials that define their phase, y 

and z, differ by a polynomial from the set Z, that is, y — z G Z . 

It may be helpful to observe Figures 2, 3 and 4. In these, the phase (di- 
vided by 2tt) of two distinct polynomial phase signals is plotted on the left, 
and on the right the principal component of the phase (also divided by 2tt) is 
plotted. The circles display the samples at the integers. Note that the sam- 
ples of the principal components intersect. The corresponding polynomial 
phase signals are aliases. 

We can derive an analogue of the theorem above in terms of the coef- 
ficients of the polynomials y and z. This will be useful when we consider 
estimating the coefficients in Section 4. We first need the following family 
of polynomials. 

Definition 1. (Integer valued polynomials) 

The integer valued polynomial of order k, denoted by pt, is 

fx\ x(x-\)(x-2)...(x-k + l) 
Pk{x) = {k)= k\ ' 

where we define po(x) = 1. 

Lemma 1. The integer valued polynomials po, . . . ,p m are an integer basis 
for Z. That is, every polynomial in Z can be uniquely written as 



(2) 



coPo + ciPiH \-c m p m , cq,c\, . . . ,c m & Z. 
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Proof. See [30, p. 2] or [5]. □ 

Given a polynomial g{x) = a + <X\X + • • • + a m x m , let 

coef (5) = [ a ai a 2 ... a m }' 

denote the column vector of length m + 1 containing the coefficients of g. 
We use superscript ' to indicate the vector or matrix transpose. If y and 
z differ by a polynomial from Z then y = z + p where p & Z and also 
coef (y) = coef (z) + coef(p). Consider the set 

L m+ i = {coef (p) ; p G Z} 

containing the coefficient vectors corresponding to the polynomials in Z. 
Since the integer valued polynomials are a basis for Z, 

L m+ i = {coef(c j>o + cipi H h c m p m ) ; a G 

= {c coef(p ) H h c m coef(p m ) ; q G Z}. 

Let 

P= [ coef(po) coef(pi) ... coef(p m ) ] 

be the m + 1 by m + 1 matrix with columns given by the coefficients of the 
integer valued polynomials. Then, 

L m+1 = {x = Pu ; u G Z m+1 } 

and it is clear that L m +i is an m + 1 dimensional lattice. That is, the set of 
coefficients of the polynomials from Z forms a lattice with generator matrix 
P. We can restate Theorem 1 as: 

Corollary 1. Two polynomial phase signals s(t) = e 2 *"^*) and r(t) = 
e ^-Kjz(t) are a n ases an d on iy if coef (y) and coef (z) differ by a lattice point 
in L m+ i. 

For the purpose of estimating the coefficients of a polynomial phase signal 
we must (in order to ensure identifiability) restrict the set of allowable coef- 
ficients so that no two polynomial phase signals are aliases of each other. In 
consideration of Corollary 1 we require that the coefficients of y(t), written 
in vector form /Li, are contained in a set of coset representatives for the quo- 
tient M m+1 /L m+ i. We call the chosen set of representatives the identifiable 
region. 

As an example consider the polynomial phase signal of order zero e 27T ^° . 
Since e 27rJ/i ° = e 2n Kf J -o+ k ) f or ari y integer k we must, in order to ensure 
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Fig 2. The first order polynomials A (3 + 8t) (solid) and ^ (33 — 2t) (dashed line). 




Fig 3. The quadratic polynomials ^(15 — 15i + At 2 ) (solid line) and (25 — t 2 ) (dashed 
line). 




Fig 4. The cubic polynomials ^(174 + 85t - 118f 2 + 40t 3 ) (so/id Zmej and ^(84+ 19* + 
12t 2 -4t 3 ) (dashed line). 
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identifiability, restrict fj,Q to some interval of length 1. A natural choice is 
the interval [— 1/2, 1 /2). The lattice L\ is the 1-dimensional integer lattice 
Z and the interval [— !/2, ^/i) corresponds to the Voronoi cell of L\. When 
m = 1 it turns out that a natural choice of identifiable region is the square 
box [— !/2, V2) 2 . This corresponds with the Nyquist criterion. The lattice L2 
is equal to Z 2 so the box [— V 2 i V 2 ) 2 corresponds with the Voronoi cell of 
L2. When m > 1 the identifiable region becomes more complicated and 
L m +i 7^ Z m+1 . 

In general there are infinitely many choices for the identifiable region. A 
natural choice is the Voronoi cell of L m+ \ used in [5]. Another potential 
choice is a fundamental parallelepiped of L m+ \. In this paper we will use 
the rectangular set constructed using Proposition 1. Observe that P is upper 
triangular with kth. diagonal element equal to p. So this rectangular set is 



(3) b = n 



k=0 



0.5 0.5 



We will make use of this set when deriving the statistical properties of the 
LSU estimator in the next section. 

We define the function dealias(x) to take x E ]R m+1 to its coset representa- 
tive inside B. That is, dealias(x) = z £ B where x — z G L m+ i. When m = 
or 1 dealias(x) = (x) where (x) = x — |~xj denotes the (centered) fractional 
part and |~xj denotes the nearest integer to x with half integers rounded 
upwards and both (•) and [-J operate on vectors elementwise. For m > 2 the 
function dealias(x) can be computed by a simple sequential algorithm [14, 
Sec. 7.2.1]. 

4. The least squares unwrapping estimator. We now describe the 
least squares unwrapping (LSU) estimator of the polynomial coefficients. 
Recall that we desire to estimate the coefficients /2o, ■ ■ ■ , fim from the noisy 
samples Yi, . . . , Y/v given in (1). We take the complex argument of Y n and 
divide by 2tt to obtain 

(4) e " = €r = + y{n)) 

where Z denotes the complex argument (or phase), and 

$ n = i-Z(l + p'h-^Xn) 

is a random variable representing the 'phase noise' induced by X n . If the 
distribution of X n is circularly symmetric (the phase ZX n is uniformly 
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distributed on [— tt,tt) and is independent of the magnitude \X n \) then 
the distribution of <l? n is the same as the distribution of 5-^(1 + p~ l X n ). 
If Xx,...,Xjv are circularly symmetric and identically distributed, then 
$1, . . . , <&„ are also identically distributed. 
Let fx be the vector [fiQ, fJ-i, ■ ■ ■ , A*m] and put, 



The least squares unwrapping estimator is defined as those coefficients p, = 
\po, ... , ju m ] that minimise SS over the identifiable region B, i.e., the LSU 
estimator is, 



It is shown in [14, Sec 8.1] [15] how this minimisation problem can be 
posed as that of computing a nearest lattice point in a particular lattice. 
Polynomial time algorithms that compute the nearest point are described 
in [14, Sec. 4.3]. Although polynomial in complexity, these algorithms are not 
fast in practice. The existence of practically fast nearest point algorithms 
for these lattices is an interesting open problem. In this paper we focus 
on the asymptotic statistical properties of the LSU estimator, rather than 
computational aspects. 

The next theorem describes the asymptotic properties of the LSU estima- 
tor. Before we state the theorem it is necessary to understand some of the 
properties of the phase noise $1, . . . , <&n, which are circular random vari- 
ables with support on [— 1/2, 1 /2) [8, 14, 31, 32]. Circular random variables 
are often considered modulo 2tt and therefore have support [— ir, tt) with 
— 7r and 7r being identified as equivalent. Here we instead consider circular 
random variables modulo 1 with support [— 1/2, l /2) and with — 1/2 and 1/2 
being equivalent. This is nonstandard but it allows us to use notation such 
as [~-J for rounding and (•) for the centered fractional part in a convenient 
way. 

The intrinsic mean (or Frechet mean) of <l? n is defined as [8, 33, 34], 





(6) 



/x = argminS'5(/x). 



(7) 



Mintr = arg min E ($. 

[-1/2,1/2) 



n 



P) 2 , 



and the intrinsic variance is 



0i 



.2 

intr 



E(e- Wntr ) 2 



/iG [-1/2,1/2) 



mm 
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where E denotes the expected value. Depending on the distribution of 3> n 
the argument that minimises (7) may not be unique. The set of minima is 
often called the Frechet mean set [33, 34]. If the minimiser is not unique we 
say that $ n has no intrinsic mean. Observe the following property of circular 
random variables with zero intrinsic mean. 

Proposition 2. Let & be a circular random variable with intrinsic mean 
l^intr = and intrinsic variance a 2 . Then $ has zero mean and variance a 2 , 
that is, E$ = and E($ - E$) 2 = a 2 . 

Proof. Assume the proposition is false and that fi = E$ ^ 0. But, then 

a 2 = E ($ - ^ntr) 2 = E ($) 2 = E$ 2 > E($ - fi) 2 > E ($ - fi) 2 , 

violating the fact that fj,- miT = is the minimiser of (7). □ 

We are now equipped to state the asymptotic properties of the LSU esti- 
mator. 

Theorem 2. Let be defined by (6) and put Ajy = dealias(/i — jEt). De- 
note the elements of Xn by Ao,tv, . . . , ^ m ,N- Suppose $i, . . . , 3>at are indepen- 
dent and identically distributed with zero intrinsic mean, intrinsic variance 
a 2 , and probability density function f, then: 

1. (Strong consistency) N k \k,N converges almost surely to as N — > oo 
for all k = 0, 1, . . . , m. 

2. (Asymptotic normality) If f((x)) is continuous at x = — V 2 an d if 
/(— 1/2) < 1 then the distribution of the vector 

VNX ,N nVn\i,n ... N m VNX m:N 
converges to the normal with zero mean and covariance 

(l-/(-l/2)) 2C ' 

where C is the m + 1 by m + 1 Hilbert matrix with elements Cq» = 
+ k + l) fori,k G {0, 1, ... ,m}. 

The proof of Theorem 2 is contained within the next two sections. Sec- 
tion 5 proves strong consistency and Section 6 proves asymptotic normality. 
Proofs for the case when m = were given in [8] and for the case when 
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m = 1 were given in [35]. The proofs here take a similar approach, but re- 
quire new techniques. The theorem gives conditions on dealias(/2 — fi) rather 
than directly on the difference ft — fi. To see why this makes sense, consider 
the case when m = 0, /to = —0.5 and /2o = 0.49, so that fiQ — JIq = —0.99. 
However, the two phases are obviously close, since the phases ±0.5 are actu- 
ally the same. In this case dealias(/2o — /Uo) = {fj-o — fio) = 0.01 as expected. 
The same reasoning holds for m > 0. 

The requirement that $i, . . . , $tv be identically distributed will typically 
hold only when the complex random variables X±, . . . ,Xjy are identically 
distributed and circularly symmetric. It would be possible to drop the as- 
sumption that $i, . . . ,<3?7v be identically distributed, but this complicates 
the theorem statement and the proof. In the interest of simplicity we only 
consider the case when , &n are identically distributed here. If X n 

is circularly symmetric with density function nonincreasing with magnitude 
\X n \, then the corresponding $ n necessarily has zero intrinsic mean [14, 
Theorem 5.2]. Thus, our theorem covers commonly used distributions for 
X±, . . . ,Xn, such as the normal distribution. 

Although we will not prove it here the assumption that <&i, . . . , &n have 
zero intrinsic mean is not only sufficient, but also necessary, for if 3>i, . . . , <3?7v 
have intrinsic mean x G [— 1 /2, 1/2) with 1 / then (Ao,tv — %) — > almost 
surely as iV — > 00, and so Ao,at does not converge to zero. On the other hand 
if <3?i, . . . , $tv do not have an intrinsic mean then Ao,at will not converge. 

The proof of asymptotic normality places requirements on the probability 
density function / of the phase noise $i,...,$jv- The requirement that 
$1, . . . , have zero intrinsic mean implies /(— 1/2) < 1 [8, Lemma 1], so the 
only case not handled is when f( — 1 /2) = 1 or when f({x)) is discontinuous 
at x = — V 2 - I n this exceptional case other expressions for the asymptotic 
variance can be found (similar to [36, Theorem 3.1]), but this comes at a 
substantial increase in complexity and we have omitted them for this reason. 

5. Proof of strong consistency. Substituting (4) into SS we obtain 



Let A = dealias(/2 — ii) = fi — fj, — p where p is a lattice point from L m+ \. 
From the definition of L m+ \ we have po + p\n + • • • + p m n m an integer 
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n k 



whenever n is an integer, so 

I m \ / m \ I m 

I ^2 X kn k ) = ( ^2(fik - ilk- Pk)n k ) = ( ^2(&k ~ flk) 

\k=0 I \k=0 I \k=0 I 

Let 

N I m \ 2 

SS(fi) = X)(*« + E A * n *) =NS N {\). 

n=l \ k=0 I 

From the definition of the dealias(-) function A € B so the elements of A 
satisfy 

0.5 , 0.5 

< 8 > -ir 5A * < *r- 

Now Aj\r = dealias(/x — /x) is the minimiser of SV in i?. We shall show that 
N k \k,N almost surely as N — > oo for all fe = 0, 1, . . . , m and from this 
the proof of strong consistency follows. Let 

N I m \ 2 

k \ 



V N (X) = ES N (X) = ^J2 E \^ + J2 X ^ 



N ' 

n=l \ fc=0 



It will follow that 



(9) sup |SV(A) — V/v(A)| — > almost surely as N — > oo. 
xeB 

This type of result has been called a uniform law of large numbers and follows 
from standard techniques [37] . We give a full proof of (9) in Appendix A. 
We now concentrate attention on the minimiser of Vn- Because <3? n has zero 
intrinsic mean 

(10) E($„ + z) 2 

is minimised uniquely at z = for z G [— 1 /2, 1/2). Since the intrinsic variance 
of is o" 2 , when z = 0, 

(11) E($! + z> 2 = E(<5> 1 ) 2 = <r 2 , 
and so the minimum attained value is a 2 . 



Lemma 2. For A G B the function Vjv(A) is minimised uniquely at 0, 
the vector of all zeros. At this minimum Vn(0) = a 2 . 
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Proof. Put z{n) = A + Xin H h A m n m . Then 

N I m \ 2 1 TV 

ma) = ( ^ + E A ^ fc ) =]yE E ^+ <*w» 2 • 

n=l \ fc=0 / n=l 

We know that E ($ n + (z(n))) 2 is minimised uniquely when {z(n)) = at 
which point it takes the value a 2 . Now (-z(n)) is equal to zero for all integers 
n if and only i£ z £ Z, or equivalently if coef(z) is a lattice point in L m +i. 
By definition i? contains precisely one lattice point from L m +i, this being 
the origin 0. Therefore V^v is minimised uniquely at 0, at which point it 
takes the value a 2 . □ 

Lemma 3. |Vjv(A/v) ~~ — ^ almost surely as N — > oo. 

Proof. By definition A at = argmin^ e ^ Sat (A) so < 5j\r(0) — Sn(Xn). 
Also, because is minimised at 0, it follows that < Vat(Aat) — Vn(Q). 
Thus, 

< V N (X N ) - V N (0) 

< v N (x N ) - V N (0) + S N (0) - s N (x N ) 

< \V N (X N ) - S N (X N )\ + \S N (0) - V N (0)\ 

which converges almost surely to zero as N — > oo as a result of (9). □ 

We have now shown that Vn is uniquely minimised at 0, that Vat(0) = a 2 , 
and that Vn(\n) converges almost surely to a 2 . These results are enough 
to show that \n converges almost surely to zero. However, this tells us 
nothing about the rate at which the components of Atv approach zero as 
required by Theorem 2. To prove these stronger properties we need some 
preliminary results about arithmetic progressions, and from the calculus of 
finite differences. 

Let W = {1, 2, . . . , N} and let K be a subset of W. For any integer h, let 

(12) A(h,K) = {n; n + ih G K V i £ {0,1,..., m}} 

be the set containing all integers n such that the arithmetic progression 

n, n + h, n + 2h, . . . , n + mh 

of length m + 1 is contained in the subset K. If K is a small subset of 
W then A(h, K) might be empty. However, the next two lemmas and the 
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following corollary will show that if K is sufficiently large then it always 
contains at least one arithmetic progression (for all sufficiently small h) and 
therefore A{h, K) is not empty. We do not wish to claim any novelty here, 
the study of arithmetic progressions within subsets of W has a considerable 
history [18-20]. In particular, Gower's [20, Theorem 1.3] gives a result far 
stronger than we require here. Denote by K\{r} the set K with the element 
r removed. 

Lemma 4. Let r £ K . For any h, removing r from K removes at most 
m + 1 arithmetic progressions n,n + h, . . . n + mh of length m + 1. That is, 

\A(h,K\{r})\ > \A(h,K)\-(m + l). 

Proof. The proof follows because there are at most m + 1 integers, n, 
such that n + ih = r for some i € {0, 1, . . . , m}. That is, there are at most 
m + 1 arithmetic progressions of type n, n-\- h, . . . n + mh that contain r . □ 

Lemma 5. \A(h,K)\ > N - mh - (N - \K\){m + 1). 

Proof. Note that |A(/j,PF)| = N — mh. The proof follows by starting 
with A(h, W) and applying Lemma 4 precisely \W\ — \K\ = N — \K\ times. 
That is, K can be constructed by removing N — \K\ elements from W 
and this removes at most (N — |i^|)(m + 1) arithmetic progressions from 
A(h,W). □ 

Corollary 2. Let K C W such that \K\ > j^l* JV. For all h such 
that 1 < h < y~ the set K contains at least one arithmetic progression 
n,n + h, . . . ,n + mh of length m + 1. That is, \A(h, K)\ > 0. 

Proof. By substituting the bounds \K\ > and h < ^ into the 

inequality from Lemma 5 we immediately obtain \A(h, K)\ > 0. □ 

The next result we require comes from the calculus of finite differences. 
For any function d(n) mapping K. to M, let 

A h d(n) = d(n + h) - d(n) 

denote the first difference with interval h, and let 

(13) A r h d{n) = A^din + h) - A r h ~ 1 d(n) = V ( T \ {-l) r ~ k d(n + kh) 
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denote the rth difference with interval h. Since X^fc=o (D = ^ r it follows that 
A r h d(n) can be represented by adding and subtracting the 

d(n), d(n + h), d{n + kh) 

precisely 2 r times. 

The operator has special properties when applied to polynomials. If 
d{n) = a r n r + • • • + ao is a polynomial of order r then 

(14) A£d(n) = h r r\a T . 

So, the rth difference of the polynomial is a constant depending on h, r and 
the rth coefficient a r [38, page 51]. We can now continue the proof of strong 
consistency. The next lemma is a key result. 

Lemma 6. Suppose Ai,A2,... is a sequence of vectors from B with 
Vn{^n) — o~ 2 — > as N — > oo. Then the elements Xo,N, ■ ■ ■ ^m,N of Xn 
satisfy N k \k jy — > as N — > oo. 

Proof. Define the function 

(15) g(z) = E{<S> 1 + z) 2 -a 2 

which is continuous in z. Because of (10) and (11), g{z) > with equality 
only at z = for z G [— V 2 ) V 2 )- Now 




V N {\ N )-o 2 = -Y J 9\(Y. nkX ^ 

as N — > oo. Let 

z N (n) = A ,at + Ai iA m + A 2 ,7vn 2 H h \m,Nn m 

so that Vjv(Ajv) — c 2 = jf^2n=i 9 i( z N{ n )}) — > as iV — >• oo. Choose con- 
stants 

and < 5 < 



2m + 2 2 2m+1 
and define the set Kn = {n < N ; | (zN(n)) \ < 5}. There exists Nq such 
that for all N > Nq the number of elements in is at least cN. Too see 
this, suppose that |-Kjv| < cN, and let 7 be the minimum value of g over 
[— V 2 ) ~~ <5] U [<5i V 2 )- Because 5(0) = is the unique minimiser of g, then 7 is 
strictly greater than and 

1 * 1 

V N (\ N )-a 2 = - y £9(.(zN(n)))>- £ 7 = (1 - c) 7 , 

n=l uGKn 
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violating that Vn(Xn) — a 2 converges to zero as N — > oo. We will assume 
N > Nq in what follows. 

From Corollary 2 it follows that for all h satisfying 1 < h < the set 
A(Ji,Kn) contains at least one element, that is, there exists n' £ A(h,K^) 
such that all the elements from the arithmetic progression n' ,n' + h, . . . ,n' + 
mh are in Kjy and therefore 

\(z N (n))\, | (z N {ri + h)) |, | (z N (n + mh)) \ 

are all less than 5. Because the mth difference is a linear combination of 2 m 
elements (see (13)) from 

(z N (n')) , (z N (n' + h)), (z N (n' + mh)) 

all with magnitude less than 5 we obtain, from Lemma 7, 

(16) |<A^(n')>|<|A^<z i v(n')>|<2 m 5. 

From (14) it follows that the left hand side is equal to a constant involving 
h, m and A mj 7v giving the bound 

(17) | (h m m\X m , N ) | = | (Afz N (n')) | < 2 m 5 

for all h satisfying 1 < h < Setting h = 1 and recalling from (8) that 
A m ,jv G [-M'M)> we have 

| {rn\X mtN ) | = \m\X m:N \ < 2 m 5. 
Now, because we chose 5 < it follows that 

So, when h = 2, 

| (2 m m!A miA r) | = |2 m m!A m jv| < 2 m <5 
because 2 m m\\„ lt N G [-0.5,0.5). Therefore 

u i 1 c 1 

IVivl < ^ < m!22m+ l- 

Now, with /i = 4, we similarly obtain 

| (4 m m!A m ,jv) | = |4 m m!A mi7V | < 2 m 8 
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and iterating this process we eventually obtain 

2 m 

|A m Jv| < 



where 2 U is the largest power of 2 less than or equal to By substituting 



2 U+1 > ^ it follows that 



(18) 



AHA m ,v| < 



ml 



for all N > Nq. As 6 is arbitrary, N m X mj N -> as N -»■ oo. 

We have now shown that the highest order coefficient A mj Ar converges as 
required. The remaining coefficients will be shown to converge by induction. 
Assume that N k Xk jv — >• for all k = r + 1, r + 2, . . . , m, that is, assume that 
the m — r highest order coefficients all converge as required. Let 

ZN,r(n) = Ao.tv + Ai 5 Arn + A 2) Am 2 H h \ r .Nn r . 

Because the m — r highest order coefficients converge we can write z^{n) = 
ZN,r(n)+lN(n) where sup n6 / 1> l7A r ( n )l — ^ as — )• oo. Now the bound 
from (16), but applied using the rth difference, gives 

(19) \(A r h z N (n')) \ = \(A r hlN (n') + A r h z r {n'))\ = | (e + h r r\K,N) I < 2 r 5, 
where 

e = A r hlN {n')<2 r sup | 7JV (n)| -> 

ne{l,...,7V} 

as N — > oo. Choose 5 and e such that 2 r 5 < \ and |e| < |. Then, from (19) 
and from Lemma 8, 

|(/i r r!A r ,jv>| < 2 r 5+ \e\ 

for all h such that 1 < h < Choosing 2 r <5+|e| < ^ttt and using the same 
iterative process as for the highest order coefficient A mj Ar (see (17) to (18)) we 
find that A r A rj Ar — > as N — > oo. The proof now follows by induction. □ 

Lemma 7. Let a\,a2, . . . ,a r be r real numbers such that \{a n )\ < 5 for 
all n = 1,2, ... ,r. Then | (X)n=l a «) I < r ^ 



Proof. If S > the proof is trivial as |(^ 




n=l a n) 



< i for all a n G 



En=l (<>n) and 



n=l 



<X>„)|<n*. 



71=1 



□ 
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Lemma 8. Let \(a + e)\ < 5 where |e| < V 4 an d < 5 < 1/4. Then 
\(a)\ <S+\e\. 

Proof. By supposition n — 5 < a + e < n + 5 for some n 6 Z. Since 
—<5 — e > — g and 5 — e < ^, it follows that 

n— 2-<?i — 5 — e<a<?i + 5 — e<n + i. 

Hence (a) = a — n and so 

-<5 - |e| < -S - e < (a) < S - e < 8 + |e| 

and \{a}\ < 6+ \e\. 

□ 



We are now in a position to complete the proof of strong consistency. Let 
A be the subset of the sample space on which Vn(\n) — cr 2 —> as N — > oo. 
From Lemma 3, the Pr{^4} = 1. Let A' be the subset of the sample space 
on which A^A/^tv — > for k = 0, . . . , m as N — > oo. As a result of Lemma 6, 
A C A' , and so Pr{j4'} > Pr{yl} = 1. Strong consistency follows. 

6. Proof of asymptotic normality. Let ip be the vector with kth 
component tp^ = N k Xk, k = 0, . . . , m and let 

N I m 

imvo = s N {\) = -Y^Un + E(^) fc ^- 

n=l \ k=0 

Let if) N be the vector with elements tpk,N — N k \f t) N so that if> N is the 
minimiser of T/v- Because each of N k Xk y N converges almost surely to zero 
as N — > oo, then if: N converges almost surely to as N — > oo. We want to 
find the asymptotic distribution of 

Ny/NX 1)N 
N m VNX m:N _ 

The proof is complicated by the fact that Tjv is not differentiable everywhere 
because (x) 2 is not differentiable when (x) = ^. This precludes the use of 
'standard approaches' to proving asymptotic normality that are based on 
the mean value theorem [21, 22, 37, 39]. However, we show in Lemma 9 that 




N 



Nib, 



m.N 
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all the partial derivatives -q^j- for I = 0, . . . , m exist, and are equal to zero, 
at the minimiser ip N . Thus, putting 



(20) 
so that 

we have, 



W n 



A' 



fc=0 



1 N m 



k=0 



dT, 



N 



N 



(V> 



A I 



N 



A 



n=l 



k=0 



for each 
and 

(21) 



0, . . . ,m. Now Di tN = K it N, where D e , N = -j- T,n=i(w) e ®n, 



/ m 



A' 



n=l 



k=0 



Lemma 11 shows that, 

m 

(22) K e , N = (h - l)VNj2^k,N{C ek + o P (l)) + o P (l), 

k=0 

for all I = 0, . . . m, where = ^ +fc+1 , and /i = f(— 1 /2), and op(l) denotes 
a random variable converging in probability to zero as N — > oo. 
It is now convenient to write in vector form. Let 



(23) k N = [ K QtN 
From (22), 

d N = k N 



K 



d 



A 



[ D . 



A 



D 



m,N 



N(h - 1)(C + o P (l))^ N + o P (l) 



where op(l) here means a vector or matrix of the appropriate dimension with 
every element converging in probability to zero as TV — > oo. Thus \HVip N has 
the same asymptotic distribution as (h — l) _1 C _1 djv- Lemma 13 shows that 
d^r is asymptotically normally distributed with zero mean and covariance 
matrix <r 2 C. Thus y/Nifj N is asymptotically normal with zero mean and 
covariance matrix 

^c^cfcr 1 )' _ ^c- 1 

(1 - hf ~ (l-h) 2 ' 
It remains to prove Lemmas 9, 11 and 13. 
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Lemma 9. For all I = 0, ...,m the partial derivatives -g^- exist, and 

are equal to zero, at the minimiser if> N . That is fj^KV'Tv) = f or each 
£ = 0,...,m. 

Proof. The function {x} 2 is differentiable everywhere except if (x) 7^—5, 

and so T/v is differentiable with respect to ij) at ip N if (&n+Y!T=o(lk) k ' t ( ) k,N) 7^ 
— \ for all n = 1, . . . , N. This is proved in Lemma 10. So the partial deriva- 
tives 1^ exist for all I = 0, . . . , m at ip^f. That each of the partial derivatives 
is equal to zero at xf) N follows since i s a minimiser of T\r. □ 

Lemma 10. |(* n + EZ=o( n / N ) k ^N)\ < ± - ^? for all n = 1, . . . , N . 
Proof. To simplify our notation let B n = $ n + Y^k=i( n / N ) k $k,N so that 



we now require to prove 
From (20), W n 



N 



B n + 1p0,N 

and 



< 



2^f for all n 



,N. 



N 



n=l n=l 

Since ipo,N is the minimiser of the quadratic above, 

1 - 

-J2(B n -W n ). 



0,N 



(24) 



^0. 



v 



n=l 



The proof now proceeds by contradiction. Assume that for some k, 

(25) { Bk + ^ N ) > \~^N- 

Let F n = W n for all n 7^ A; and Ffc = + 1, and let 

v 



n=l 
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Now, 



(B k + 4> - F k ) 2 = (B k + q 


\-w k - 


I) 2 


= {B k + q 


> - w k f 


- 2(B k + - W k ) + 1 


= (B k + q 


>-w k f 


- 2(5 fc + V^o.tv - W fc ) + 1 ■ 


= (B k + q 


i - W k ) 2 


-2^B k + ^o,tv) + 1 - 


<(B k + q 


i - ^fc) 2 


1 



2 

iV 



(26) 

where the inequality in the last line follows from (25). Let b = [</>, ipiN, ■ ■ ■ , ip m ,N\ 
be the vector of length m + 1 with components bo = 4> and 6^ = for 
^ = 1, . . . m. Now, 



N N 



NT N (h) = (B n + <t>f < ^(S n + <f> - F, 



2 

n I i 



n=l n=l 

and using the inequality from (26), 

N 

N 



NT N (h) < -1 + J2(B n + $ - W n ) 

n=l 

1 - 1 

+ ^2(B n + i; 0tN + --W n ) 2 



N ^ y 1 ■ N 

n=l 

N N 

J2( B n + ^0,N ~ W n ) 2 + — J2(Bn + ^0,N ~ W n 
n=l n=l 



NT N (il> N ), 



because jj Yln=i(Bn + 4>o,N — W n ) = as a result of (24). But, now TV (b) < 
Tn(iPn) violating the fact that if}^ is a minimiser of Tjv- So (25) is false by 
contradiction. 

If (^B k + tfto,N/ < —\ + 2lv ^ or some we se t Fk = W k — 1 and using 

the same procedure as before obtain T/v(b) < T^(ip N ) again. The proof 
follows. □ 

Lemma 11. With defined in (21), h = /(— 1/2), and Ci k = 

we have K^ N = (h - 1)VN YH=o^k,N(Ce k + o P (l)) + o P (l) for all £ = 
0, . . . , m. 
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PROOF. Care must be taken since \j) N depends on the sequence {$n}- 
For n = 1, . . . , N and positive N, let 

m 

(27) PnN^) = J2(%) k ^ 

k=0 

and put q n {x) = \<& n + xj and Q(x) = Kq n (x) = Egi(x). Let 

1 N 

(28) Gi, N (ll>) = -= (f) {inipnN^)) ~ QipnN^))), 

* 71=1 

and put 

m 

(29) PnN =PnN( : 4>N)=^2{w) k & 



k 

k,N- 



k=0 



Now W n from (20) can be written as W n = [~$ n + p n N\ = QniPnN) and K^n 
from (21) can be written as 



1 N 



n=l 
1 N 

7J^ ^2(ijY {iniPnN) ~ PnN + Q(PnN) ~ Q(PnAr)) 



n=l 



where 



1 N 

(30) ^,7V = Y^YiQiPnN) ~ PnN)- 



n=l 

Lemma 18 in the Appendix shows that for any 5 > and v > there exists 
an e > such that 

Pr \ sup |G 4JV (t/>)| > S \ < v 

for all positive integers N and all I = 0, . . . , m, where HV'lloo = sup fc |^|. 
Since Vtv converges almost surely to zero, it follows that 

lim Pr{ H^Hoo > e) =0 
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for any e > 0, and therefore Pr{||?/>Ar||oo > e} < v for all sufficiently large 
N. Now 



Pr 



Gt,N$N) > s } =Pr{ G tjN (i> N ) > 5 , ||V>aHIoc < e} 



+ Pr 



<Pr<^ sup \G £jN (ij>)\>5}+-Pr\\\iJ> N \\ 00 >e\ 
< 2u 

for all sufficiently large N. Since v and 5 can be chosen arbitrarily small, 
it follows that Gh^^n) converges in probability to zero as N — > oo, and 
therefore K& jv = Hi n + °p(1)- Lemma 12 shows that 

m 

H t , N = {h-l)VNj^ $k,N{Ce k + o P (l)) . 



fc=0 



□ 



i/'r 



Lemma 12. Mtt H^n defined in (30), /i = f(— l /2), and Cik = 
have H i)N = (h - l)\fNYlk=o^h,N{C ik + o P (l)). 



Proof. If \x\ < 1, then 



q n (x) = \$ n + x\ = < 



1, $n + X > 1/2 
-1, ^> n + X<-l/2 

0, otherwise, 



and, 



Q(x) = E qi (x) 



' tJ,lj{t)dt, x>0 



Because f((x}) is continuous at — V 2 if follows that Q(x) = x(/i+£(x)) where 
is a function that converges to zero as x converges to zero. Observe 
that \p n N\ < J2T=o l^kjM and, since each of the tpk,N — > almost surely as 
TV" — > oo, it follows that p n N —> almost surely uniformly in n = 1, . . . , N as 
./V — > oo. Thus, ((Pun) almost surely (and therefore also in probability) 
uniformly in n = 1, . . . , N as N — > oo. Now, 



Q(PnN) -PnN =PnN{h~ 1 + C(Pniv)) =PnN{h~ l + Op(l)), 
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and, using (30), 

1 N 

H ^ = 7f T,(fYPnN{h - 1 + P (1)) 
n=l 

1 JV m 

= ~W D*)' B# - 1 + opQ)) 

ViV n=l k=0 

m ^ t-\-k 

= Viv^^-^^ TT (/ l -i + p(i)). 



The Riemann sum 

N „e+k /•! 



n=l JV 

and since the integral above evaluates to Ctk = k+t+i » we nave 

m 

^ )JV = (/l " 1) >/iV $k,N( C to + °-P( 1 )) 
fc=0 



□ 



Lemma 13. TTie distribution of the vector d^v , defined in (23), converges 
to the multivariate normal with zero mean and covariance matrix o~ 2 C. 

Proof. For any constant vector a, let 

N m 



n 

n=l £=0 

By Lypanov's central limit theorem z^ is asymptotically normally distributed 
with zero mean and variance 



1 N ( m n A 
n=i \e=o / 



2 

= (t 2 q'Cq. 



By the Cramer- Wold theorem it follows that d^r is asymptotically normally 
distributed with zero mean and covariance o~ 2 C. □ 
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7. Simulations. This section describes the results of Monte-Carlo sim- 
ulations with the least squares unwrapping (LSU) estimator. The sample 
sizes considered are N = 10,50,200 and the unknown amplitude is p = 1. 
The Xi, . . . , Xn are pseudorandomly generated independent and identically 
distributed circularly symmetric complex Gaussian random variables with 
variance o~1 . The coefficients fi = [ftQ , . . . , fl m ] are distributed uniformly ran- 
domly in the identifiable region B. The number of replications of each ex- 
periment is T = 2000 to obtain estimates . . . , fi T and the corresponding 
dealiased errors A( = dealias(/i r — jx) are computed. The sample mean square 
error (MSE) of the kth. coefficient is computed according to ^ Ylt=i t 
where X^t is the kth element of A^. 

Figure 5 shows the sample MSEs obtained for a polynomial phase signal 
of order m = 3. Results are displayed results for the zeroth and third order 
coefficients fix and ^3. The results for fli and /I2 lead to similar conclu- 
sions. When N = 10 and 50 the LSU estimator can be computed exactly 
using a general purpose algorithm for finding nearest lattice points called 
the sphere decoder [16, 40, 41]. This is displayed by the circles in the fig- 
ures. When N = 200 the sphere decoder is computationally intractable and 
we instead use an approximate nearest point algorithm called the K-best 
method [42]. This is displayed by the dots. For the purpose of comparison 
we have also plotted the results for the K-best method when N = 10 and 50. 
The asymptotic variance predicted in Theorem 2 is displayed by the dashed 
line. Provided the noise variance is small enough (so that the 'threshold' is 
avoided) the sample MSE of the LSU estimator is close to that predicted 
by Theorem 2. The Cramer- Rao lower bound for the variance of unbiased 
polynomial phase estimators in Gaussian noise is also plotted using the solid 
line [43]. When the noise variance is small the asymptotic variance of the 
LSU estimator is close to the Cramer-Rao lower bound. 

8. Conclusion. This paper has considered the estimation of the coeffi- 
cients of a noisy polynomial phase signal by least squares phase unwrapping 
(LSU). It has been shown that the LSU estimator is strongly consistent 
and asymptotically normally distributed. Polynomial time algorithms that 
compute the LSU estimator are described in [14], but these are slow algo- 
rithms in practice. A significant outstanding question is whether practically 
fast algorithms exist. Considering the excellent statistical performance (both 
theoretically and practically) of the LSU estimator, even fast approximate 
algorithms are likely to prove useful for the estimation of polynomial phase 
signals. 




Fig 5. Sample mean square error (MSE) of the least squares unwrapping estimator for 
N = 10,50 and 200 for a polynomial phase signal of order m — 3. (Top) MSE of the 
frequency coefficient (Bottom) MSE of the cubic coefficient [13. 
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APPENDIX A: A UNIFORM LAW OF LARGE NUMBERS 
During the proof of strong consistency we made use of the fact that 
(31) sup\S N (\)-V N (\)\ ^0 

almost surely as N — > oo, where Vat(A) = E5jv(A). We prove this result here. 
PutD N (\) = 5jv(A)-Viv(A).NowX;~=iPr{sup AeB |Z)jv(A)| > e} < oo for 
any e > by Lemma 14, and (31) follows from the Borel-Cantelli lemma. In 
what follows we use order notation in the standard way, that is, for functions 
h and g, we write h(N) = 0(g(N)) to mean that there exists a constant 
K > and a finite N such that h(N) < Kg(N) for all N > N . 

LEMMA 14. Pr {sup AgB \D N (X)\ > e} = 0(e~ ce2N ) for any e > and 
c<2. 



Proof. Consider a rectangular grid of points spaced over the identifiable 
region B. We use A[r], where r € Z m+1 , to denote the grid point 



A[r] 



1 



n 



N b 2' N b+1 



in 



[JSfb+r 



2(m!) 



for some constant b > 0. Adjacent grid points are separated by in the 
zeroth coordinate, jJj+i m the first coordinate and 
dinate. Let 



B[r] 



x e 



k\N b + k 



in the A;th coor- 



,+i._r* L _ <X ,J_ 

' N b + k ~ k ^ 2(jfe!) A b + fc j " 



and let G be the finite set of grid points 



G 



x G 



x fc = 0,l,2...,JV 



The total number of grid points is |G| = Ar(™+i)(26+m)/2 ; and the B ^ 
partition B, that is, B = U re ci?[r]. Now 



(32) 



sup \D N (X)\ = sup sup \D N (X[r}) + D N (X) - D N (X[r})\ 

\eB reG\eB[r] 

< sup \D N (X[r})\ + sup sup \D N (X) - D N (X[r] 

rSG reGAeBfrl 



From Lemma 15 it will follow that Pr {sup r6G |£>jv(A[r])| > f } = 0(e~ ce2N ) 
for any e > and c < 2. In Lemma 17 we show that 

Tfl ~\~ 1 

sup sup \D N (X) - D N (X[r])\ < 2— j-. 

reGAeBfrl iV 
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Combining these results with (32), we obtain 

P r ( S | DN (A)l>| + ^i))=0 (e -— ), 

and for sufficiently large N, we have e/2 + 2 ^|^ < e completing the proof. 
It remains to prove Lemmas 15 and 17. □ 

Lemma 15. Pr{sup r6G |-Djv(A[r])| > e} = 0(e~ ce2N ) for any e > and 
c < 8. 

PROOF. Fix A and write D N (X) = Z = Y,n=i z n, where 



Z n = ( $n + ^ ) ~ E ( ®n + X k n 



2 

k \ 



k=0 I \ k=0 



are independent with zero mean and \Z n \ < j. It follows from Hoeffding's 
inequality [44] that, Pr{\D N (\)\ > e} < 2e~ 8e2N , and so, 

Pr(sup|Av(A[r])| > el < VPr{|%(A[rj)| > e} 
UeG J ^ 

= 2|G|e- 8e2iV = 0(e~^ N ), 

where c is any real number less than 8, since \G\ = ]\ l f( m + 1 )( 2b + m )/ 2 [ s poly- 
nomial in N. □ 

Before proving Lemma 17 we need the following result. 

Lemma 16. (x) 2 - \S\ < (x + 5) 2 < (x) 2 + \S\ for all x, S G R. 

Proof. Since \5\ < \n + S\ for all 5 G [-1/2, V 2 ) an d n £ Z, the result 
will follow if we can show that it holds when both x and 5 are in [— 1/2, 1/2). 
Also, for reasons of symmetry, we need only show that it holds when 5 > 0. 
Now 



(x + S) 2 -x 2 



2x5 + <5 2 , x G [-1/2, V 2 - &) 

2x(<5 - 1) + (<5 - l) 2 , xG [1/2 -5, 1/2) 



But, when x G [— !/2, 1/2 — <5), 

-!<-! + £< 2a; + <5<1-<5<1, 
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and so 

-5 < (-1 + 5)5 < (2x + 5)5 < (1 - 5)5 < 5. 
Also, when x £ [ 1 /2 — 5, 1/2) we have —5 < 2x + 5 — 1 < 5, and consequently 

-5 < -5(1 -5)<(2x + 5- 1)(1 - 5) < 5(1 - <S) < A 

□ 

Lemma 17. sup reG sup AeB[r] \D N (A) - Dtv (A[r])| < 2^ for all N. 

Proof. Put b n = $ n + E/T=o A fc™ fc and a n = $n + Efclo A [ r ]fc« fe 5 where 
A[r]fe denotes the fcth element of the grid point A[r]. For A £ JB[r] we have 

fen = a n + <5 n , where |5 n | < Ylh=o k\N b + k - Iv^"' From Lemma 16 li follows 
that —\5 n \ < (x + b n ) 2 — (x + a n ) 2 < \5 n \, and consequently | (x + b n ) 2 — 
(x + a n ) 2 | < for all Now 

1 - 

S N {\) - S N {\[r}) = - ( <*n + - ($„ + a n ) 2 ) 

n=l 

and therefore \S N (X) - S N (X[r])\ < affi for all A € J3[r]. As this bound is 
independent of $i . . . we have 

Tf) A- 1 

|^(A) - V N (X[r])\ < E|^(A) - Sjv(A[r])| < 

by Jensen's inequality. Therefore, for all A £ -B[r], 

| Av(A) - Djv(A[r])| = |5jv(A) - S*(A[r]) + VW(A) - V^(A[r])| 

< |5jv(A) - 5jv(A[r])| + \V N (\) - Vjv(A[r])| 

m + 1 

< 2 — -7—, 

and the lemma follows because this bound is independent of r. □ 

APPENDIX B: A TIGHTNESS RESULT 

During the proof of asymptotic normality in Lemma 11 we made use of 
the following result regarding the function, 

1 N 

* 71=1 

where the functions g n , Q and p n 7v are defined above (28) and £ £ {0, 1, ... , m}. 
To simplify notation we drop the subscript £ and write G^iv as Gat in what 
follows. The proof we will give holds for any nonnegative integers I. 
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Lemma 18. For any 5 > and v > i/iere exists e > suc/i that 
PrJ sup |Gat(V0I > $ } < » 



for all positive integers N. 



This result is related to what is called tightness or asymptotic continuity 
in the literature on empirical processes and weak convergence on metric 
spaces [23, 24, 45, 46]. The lemma is different from what is usually proved 

in the literature because the function p n N{^) = SfcLo (t?)*^ depends on 
n. Nevertheless, the methods of proof from the literature can be used if we 
include a known result about hyperplane arrangements [25, Ch. 5] [26, Ch. 
6]. Our proof is based on a technique called symmetrisation and another 
technique called chaining (also known as bracketing) [22, 23]. 



m 

k 

k=0 



Proof. Define the function 

f nN ^,<S> n ) = (§Yq n (PnNm = (§Y *» + 

so that Gm can be written as 

1 - 



n=l 



Let {g n } be a sequence of independent standard normal random variables, 
independent of the phase noise sequence {^ n }- The symmetrisation argu- 
ment [22, Sec. 4] [23, 47] can be used to show that 



E sup \G N (ij})\ < V2ttE sup \Z N (if>)\ 

W\ao<e l|i/>lloo<e 



where 

N 



(33) Z N {^) = ^=Y^g n f nN {i,^ n ), 



n=l 



and where IE runs over both {g n } and {^n}- Conditionally on {^n}) the 
process {Zj^(ip),ip £ M m+1 } is a Gaussian process, and numerous techniques 
exist for its analysis. Lemma 19 shows that for any k > there exists an e > 
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such that Esup|j^| ]oo<e \Z N (tp)\ < k. Thus Esup Moo<e \G N (tp)\ < V2vr k, 
and by Markov's inequality, 

Pri sup \G N W)\ >s\ < J*k% 



6' 



for any 8 > 0. The proof follows with v = \/2/kk/5. It remains to prove 
Lemma 19. 

□ 

Lemma 19. For any k > there exists e > such that 

E sup \Z N (if>)\ < k. 

ll^l|oo<e 

Proof. Without loss of generality, assume that e < ^W- Lemma 20 
shows that 



(34) E* sup \Z N {%I>)\ < K iy /C e ({$ n }), 



where K\ is a finite, positive constant, and C e ({$ n }) is the average number 
of times |<3?i| , . . . , |<3?at| is greater than or equal to l /2 — (m + l)e. That is, 



1 N 

(35) C e ({9 n }) = -'£ i I e {\9 n \) 



n=l 



where J e (|$ n |) is 1 when |$ n | > !/2 — (m + l)e and zero otherwise. Recall 
that / is the probability density function of and (by assumption in 
Theorem 2) that f({x)) is continuous at x = — V 2 - Because of this, the 
expected value of C e ({& n }) is small when e is small, since 



EC e ({$ n }) = l^EJ e (|$ n |) 

71=1 

= Pr{|*i| > V 2 - (m + l)e} 

-l/2+(m+l)e /-1/2 

/(<£)#+ / /(<£># 

-1/2 Jl/2-(m+l)e 
■l/2-(m+l)e 

/((0»# 

'-l/2+(m+l)e 

2(m + l)e(/(-V 2 ) + o(l)), 



32 



R. MCKILLIAM ET. AL. 



where o(l) goes to zero as e goes to zero. Since \f~- is a concave function on 
the positive real line and C e ({& n }) is nonnegative, it follows from Jensen's 
inequality that E v / C e ({$ n }) < ^/EC £ ({$„}) < y/K 2 e for some constant 
K<i. Applying IE to both sides of (34) gives 

E sup \Z N (il>)\ < K iy /EC £ ({$ n }) < K iy /K^. 

IMI°o<e 

Choosing e = k 2 /(K1K2) completes the proof. It remains to prove Lemma 20. 

□ 

The proofs of Lemmas 20 and 22 are based on a technique called chaining 
(or bracketing) [21-24, 48]. The proofs here follow those of Pollard [22]. 
In the remaining lemmas we consider expectation conditional on and 
treat {& n } as a fixed realisation. We consequently use the abbreviations 
C e = C e ({<J> n }) and /nJv(V') = /nAr(V') ^n)- As in Lemma 19 we assume, 
without loss of generality, that e < ^xi- 

Lemma 20. There exists a positive constant K\ such that 

E$ sup \Z N (tp)\ < Kiy/Cl. 

l!V>l!oo<e 

PROOF. Let fi e = {x G R ; ||x|| oc <e}. For each non negative integer 
k, let T e (k) be a discrete subset of ]R m+1 with the property that for every 
if) £ B e there exists some ?/>* 6 T t (k) such that the pseudometric 

N 

d(V, V*) = Yl ifnNW ~ fnN{^)f < 2~ k C e N. 
n=l 

We define T e (0) to contain a single point, the origin 0. Defined this way T t (0) 
satisfies the inequality above because d(rp,0) = Yln=i fnN(ip) 2 < C e N for 
all ij> e B e , result of Lemma 21. 

The existence of T e {k) for each positive integer k will be proved in Lemma 24. 
It is worth giving some intuition regarding T e (k). If we place a 'ball' of radius 
2~ k C € N with respect to the pseudometric d(-, •) around each point in T e (k), 
then, by definition, the union of these balls is a superset of B e . The balls are 
said to cover B e and T e (k) is said to form a covering of B e [24, Section 1.2]. 
The minimum number of such balls required to cover B e is called a covering 
number of B e . In Lemma 24 we show that no more than if 3 2( m+1 ) fc balls of 
radius 2~ k C e N are required to cover B e , that is |T e (fc)| < K 3 2^ m+1 ^ k , where 
K% is a constant, independent of N and e. 
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Since /nJv(V') = (77) \&n + PnN{'4 , )\ is a multiple of N e for all tp G 
R m+l , it follows that d(tp,tp*) is a multiple of N~ 2t . When 2 k > C e N l+2£ 
we have < d(ip,i}>*) < 2~ h C e N < N~ 2E , and so d(ij),ip*) = 0, and conse- 
quently iwvO) = Un (*(>*) for every n = 1, N and Z N (ip) = Z N (ip*). 
Thus, 

sup |Zjv(t/>)| = sup \Z N (ip)\ 

for all k large enough that 2 k > C e N 1+2e . So, to analyse the supremum of 
Zn{x})) over the continuous interval B t it is enough to analyse the supremum 
over the discrete set T t (k) for large k. Lemma 22 shows that 

y/lA~i + A 2 



E $ sup \Z N {$)\ < VCeJ2 oi/T 2 <P ° 

for every positive integer k, where A\ = 18(m + 1) log 2 and A2 = 18 log K3 
are constants and log(-) is the natural logarithm. The lemma holds with 
Ki = EZi 2~ l/2 ViA!+A 2 . □ 

Lemma 21. For e < and all ^ G 5 e and n = 1, . . . , N, 
LnW 2 < 1/nivWI < 4(1^1) 

and consequently, 

N N 

(36) J2 f^w 2 ^ E 1/^(^)1 ^ ^ 

Proof. Recall that j n N\w) = (77) \&n + PnN{ip)\- Because < e for 
all i = 0, . . . , m, 



(37) \PnNW\ 



E (a) V* 



i=0 



< (m + l)e < 1. 



Since <£ n G [- 1 / 2 ! V 2 )j ^ follows that f n N(4>) equals either — or 
and so 

(38) fnNW 2 < 1/nJvWI < I- 

Whenever / n Af('0) 7^ we must have 

|$n| > Va - \PunW\ > V2 - (m + l)e 
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and Ie(|$n|) = 1- Thus, for all n = 1, . . . , N, 

fnN^f < |/nJv(VOI < h{\®n\). 

Summing the terms in this inequality over n = 1,...,N and using (35) 
gives (36). □ 

Lemma 22. (Chaining) For all positive integers k, 

k 



sup \z N m<Vc e J2 ViA om A ^ 

where the constants A\ = 18(m + 1) log 2 and Ai = 18 log K3. 

Proof. Let b k be a function that maps each i/> 6 T e (k) to b k (i()) € 
T t (k - 1) such that d(V>A(t/>)) < 2 1_fc C e 7V. The existence of the function 
b k is guaranteed by the definition of T e (k). By the triangle inequality, 

\Z N (iP)\ < \Z N (b k (1>))\ + \Z N (iP) - Z N (b k (iP))\ , 

and by taking supremums on both sides, 

sup \Z N (ip)\ < sup \Z N (b k {ip))\+ sup \Z N (iJ}) - Z N (b k (iJ>))\ 
■4>&T c (k) */>eT e (fc) VeT E (fc) 

(39) < sup \Z N (il>)\+ sup \Z N (il>) -Z N (b k (i>))\, 

VeT £ (fc-i) V>eT E (fc) 

the last line following since b k (tp) G T 6 (k — 1) and so 

sup |Zjv(&jfc(V>))| < SU P \ z n(*I>)\ ■ 
ipeT e (k) VeT e (fe-i) 

Conditional on $1, $2) •• • the random variable X(xj)) = Z]y(tp) — Zjv(f>jfc(VO) 
has zero mean, and is normally distributed with variance 



1 N 



N 

n=l 



N 

77 E {fnN{i>) ~ fnNibkW)) 2 = dty, b k (l/>)) < 2 1 ~ k C t , 



N 

n=l 



because E^g,^ = 1. Using Lemma 23, 

■y/kAi+A 2 



E$ sup \X(il>)\ < 3 A /2 1 - fc C e log|T e (fc)| < VC e 
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because log |T 6 (fc)| < k(m + 1) log 2 + log K 3 . Taking expectations on both 
sides of (39) gives 

-y/kA 1 +A 2 



E$ sup |Zjv(^)|<E$ sup \Z N (tP)\ + y/C £ 

Ver £ (fc) veT £ (fc-i) 2 / 

which involves a recursion in k. By unravelling the recursion, and using the 
fact T e (0) contains only the origin, and therefore 

E* sup \Z N {ij>)\ =E$|Ziv(0)| =0, 
Ver e (o) 



we obtain E$ sup^, 6T(! (fc) \Zn(i/})\ < VClYli=i 2 ^ 2 as rec l urre d- D 

Lemma 23. (Maximal inequality) Suppose Xi, . . . ,Xn are zero mean 
Gaussian random variables each with variance less than some positive con- 
stant K, then Esup n=1 ^ jy \X n \ < 3\/ K log N where log A'" is the natural 
logarithm of N . 

Proof. This result is well known, see for example [22, Section 3] □ 

Lemma 24. (Covering numbers) For k G 7L there exists a discrete set 
T e (k) C R m+1 with the property that, for every ip G B e , there is aip* G T e (k) 
such that, 

N C N 

The number of elements in T e (k) is no more than K 3 2( m +V k where K 3 is a 
positive constant, independent of N , e and k. 

Before we give the proof of this lemma we need some results from the lit- 
erature on hyperplane arrangements and what are called e-cuttings [25, 26]. 
Let H be a set of m-dimensional affine hyperplanes lying in M m+1 . By affine 
it is meant that the hyperplanes need not pass through the origin. For each 
hyperplane h G H let D{h) and its complement D{h) be the corresponding 
half spaces of M m+1 . For a point x G R m+1 , let 



(40) b(h,x) 



1 x G D(h) 
x£fl(/i). 



Note that b(h, x) is piecewise constant in x. Two points x and y from ]R m+1 
are in the same halfspace of h if and only if b(h,x) = b(h,y). So the pseu- 
dometric 

a(x,y) = ^\b(h,x)-b(h,y)\ 

heH 
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is the number of hyperplanes in H that pass between the points x and y. 

The next theorem considers the partitioning of IR m+1 into subsets so that 
not too many hyperplanes intersect with any subset. Proofs can be found in 
Theorem 5.1 on page 206 of [25] and also Theorem 6.5.3 on page 144 of [26]. 

Theorem 3. There exists a constant K, independent of the set of hyper- 
planes H, such that for any positive real number r, we can partition R m+1 
into Kr m+l generalised (m + 1)- dimensional simplices with the property that 
no more than \H\ jr hyperplanes from H pass through the interior of any 
simplex. 

By the phrase 'There exists a constant K, independent of the set of hy- 
perplanes H\ it is meant that the constant K is valid for every possible set 
of hyperplanes in R m+1 , regardless of the number of hyperplanes or their 
position and orientation. A generalised (m + l)-dimensional simplex is the 
region defined by the intersection of m + 2 half spaces in M m+1 . Note that 
a generalised simplex (unlike an ordinary simplex) can be unbounded. For 
our purposes Theorem 3 is important because of the following corollary. 

Corollary 3. There exists a constant K , independent of the set of 
hyperplanes H , such that for every positive real number r there is a dis- 
crete subset T C M m+1 containing no more than Kr m+l elements with the 
property that for every x G M m+1 there exists y G T with <r(x, y) < \H\ jr. 

Proof. Let C be the set of generalised simplices constructed according 
to Theorem 3. Define T as a set containing precisely one point from the 
interior of each simplex in C. Let x G R m+1 . Since 6(/i,x) is piecewise 
constant for each h G H, and since C partitions M m+1 , there must exist 
a simplex c G C with a point z in its interior such that b(h, z) = b{h, x) 
for all h G H, and correspondingly <r(z,x) = 0. Let y be the element from 
T that is in the interior of c. Since at most \H\Jr hyperplanes cross the 
interior of c there can be at most \H\ jr hyperplanes between z and y, and 
so cr(z,y) < \H\ jr. Now cr(x, y) < a(z,x) + <r(z,y) < — follows from the 
triangle inequality. □ 

The previous corollary ensures that we can cover IR' m+1 using Kr m+l 
'balls' of radius \H\ jr with respect to the pseudometric <r(x, y). The balls 
are placed at the positions defined by points in the set T, and these points 
could be anywhere in R m+1 . The next corollary asserts that we can cover a 
subset of M m+1 , by placing the balls at points only within this subset. 
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COROLLARY 4. Let B be a subset ofR m+1 . There exists a constant K, 
independent of the set of hyperplanes H , such that for every positive real 
number r there is a discrete subset Tb C B containing no more than Kr m+1 
elements with the property that for every x € B there exists y S Tg with 
cr(x,y) < \H\ jr. 

Proof. Let C be the set of generalised simplices constructed according 
to Theorem 3 and let Cb be the subset of those indices that intersect B. 
Let Tb contain a point from cD B for each simplex c £ Cb- The proof now 
follows similarly to Corollary 3. □ 

We are now ready to prove Lemma 24. 

Proof. (Lemma 24) Put g nN (iJ>) = (f )Vniv(V>) = \®n + PnN{^)\, and 

let 

N 



dg(^,1p*) = (Sniv(V>) - 5nivO*))' 



n=l 



We have d(il>,xf>*) < and so it suffices to prove the lemma with 

d replaced by d g . From (37) it follows that \p n N{^)\ < {m + l)e < 1. Since 
G [-1/2,1/2), when $ n > 0, 



9nN{ll>) 

and when $ n < 0, 

9nN{lp) = 



1 PnJv(V0 > V 2 - $ ™ 

otherwise, 



-1 p„7v(V') < - 1 / 2 - $n 

otherwise. 



Thus, {g n N {"*!>) ~ 9nN(ip*)) 2 is either equal to one when g n N(ip) ¥= 9nN {"*!>*) 
or zero when g nN (ip) = 9nN(^*)- Now g nN (ip) / only if 

\®n\ > V 2 " |Pniv(^)l > V 2 - (m + l)e, 

that is, only if I e ($ n ) = 1. Let A = {n € {1, . . . , N} ; / e ($„) = 1} be the 
subset of the indices where J e ($ n ) = 1. By definition the number of elements 
in A is C e N (see (35)). If both i\> and ip* are in B e , then 

(5nAr(V>) - M(f)f ^ 
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only if n ^ A. Thus, 

N 

dg(l(>, V*) = XI (fniv(V') - 9nN{^*)) = ^ (SntfM ~ 9nN{^*)) ■ 
n=l n£A 

We now use Corollary 4. Let /i n be the m dimensional hyperplane in M m+1 
satisfying 

m 

pun{^) = (#) v< = 5 s § n ( $ «) - $ ™ 

i=0 

where sgn (<& n ) is equal to 1 when $ n > and —1 otherwise. The hyperplane 
h n divides R m+1 into two halfspaces, D(h n ) and its complement D(h n ). If 
■0 and ip* are in the same halfspace, then \b(h n ,ip) — b(h n ,xf)*)\ = and 
9unW = 9nN (*/>*), and therefore (swvO) - g n N(tp*)) 2 = 0. Otherwise, if 
t/> and are in different halfspaces, then \b(h n ,ip) — b(h n ,i/)*)\ = 1 and 
9nN(ij>) 9nN(i/>*), and therefore (swvO) - swvO*)) 2 = 1- Thus, 

for n = 1, . . . ,N. Let be the finite set of hyperplanes {h n ,n 6 A} and 
observe that the number of hyperplanes is \H\ = \A\ = C e N. When both rj) 
and ip* are inside B t , d g can be written as 



d g (^,^*) = J2\ b (hn,^)-b(h n ,^*)\ 

= J2\ b (h,^)-b(h,^*)\=a^,^*). 



heH 

That is, when both ip,tp* € B e , d g (ip,ip*) is the number of hyperplanes 
from H that pass between the points tp and if}*. 

It follows from Corollary 4 that for any positive r there exists a finite 
subset Tb of B e containing at most K^r m+l elements, such that for every 
xj) G B e there is a ip* € Tg with 

j / , ; *\ . I#l |A| cyv 

V ) = <7(V,^ )< — = — = — • 

Putting r = 2 k and choosing T e (k) = Tb completes the proof. □ 
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